Skip to content

dataset_utils: add mask_out_tokens to train_on_responses_only (fixes unslothai/unsloth#6695)#852

Open
pjordanandrsn wants to merge 2 commits into
unslothai:mainfrom
pjordanandrsn:fix/6695-mask-out-tokens
Open

dataset_utils: add mask_out_tokens to train_on_responses_only (fixes unslothai/unsloth#6695)#852
pjordanandrsn wants to merge 2 commits into
unslothai:mainfrom
pjordanandrsn:fix/6695-mask-out-tokens

Conversation

@pjordanandrsn

Copy link
Copy Markdown

Implements the feature requested in unslothai/unsloth#6695: a way to keep specific tokens —
canonically the </think> closer — out of the loss even inside kept response spans (the
Nemotron-Ultra recipe the issue cites).

What it does

train_on_responses_only(trainer, instruction_part=..., response_part=...,
                        mask_out_tokens=["</think>"])

After response spans are unmasked, each listed string is re-masked to −100 wherever its token-id
sequence occurs. Matching is by tokenized id sequence plus a leading-space variant (covers
SentencePiece-style in-context merges); atomic added tokens like </think> always match exactly.
The scan is O(n) integer comparisons inside the existing per-example closure; sequences are
precomputed once outside it.

Verification (real Qwen3-0.6B chat render, two assistant turns)

  • Baseline behavior preserved: without the kwarg, output labels are identical to current main
    (asserted elementwise on the rendered example).
  • The motivating case: current main trains on </think> (id 151668) inside the kept response;
    with mask_out_tokens=["</think>"] the label diff is exactly that position set to −100 — nothing
    else changes.
  • Multi-token strings: mask_out_tokens=["Final answer:"] masks exactly the trainable
    positions of both in-context occurrences (matched via the bare [19357, 4226, 25] and
    space-prefixed [13023, 4226, 25] variants), all set to −100, nothing else touched.

Happy to add a unit test in your suite mirroring the above (render → label-diff assertions) if
you'd like it in-tree — say the word and I'll push it to this branch.

Re-mask listed token strings to -100 inside kept response spans (e.g.
mask_out_tokens=["</think>"], the Nemotron-Ultra recipe from
unslothai/unsloth#6695). Each entry matches as its tokenized id sequence,
plus a leading-space variant for SentencePiece-style in-context merges;
atomic added tokens always match exactly. No behavior change when the
kwarg is absent.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mask_out_tokens parameter to the train_on_responses_only function in unsloth_zoo/dataset_utils.py, allowing specific token sequences (like </think>) to be masked out even within kept response spans. The reviewer suggests adding explicit type validation for mask_out_tokens to prevent potential runtime errors if invalid types are passed.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +389 to +397
mask_out_sequences = []
if mask_out_tokens:
if isinstance(mask_out_tokens, str): mask_out_tokens = [mask_out_tokens]
for token_string in mask_out_tokens:
for candidate in dict.fromkeys((token_string, " " + token_string)):
ids = tokenizer(candidate, add_special_tokens = False).input_ids
if ids and ids not in mask_out_sequences:
mask_out_sequences.append(ids)
pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure robust defensive programming and prevent cryptic runtime errors (such as TypeError during string concatenation or iteration), we should explicitly validate that mask_out_tokens is either a string or a sequence of strings.

    mask_out_sequences = []
    if mask_out_tokens:
        if isinstance(mask_out_tokens, str):
            mask_out_tokens = [mask_out_tokens]
        elif not isinstance(mask_out_tokens, (list, tuple)):
            raise TypeError("Unsloth: mask_out_tokens must be a string, list, or tuple of strings.")
        for token_string in mask_out_tokens:
            if not isinstance(token_string, str):
                raise TypeError(f"Unsloth: mask_out_tokens elements must be strings, but got {type(token_string).__name__}")
            for candidate in dict.fromkeys((token_string, " " + token_string)):
                ids = tokenizer(candidate, add_special_tokens = False).input_ids
                if ids and ids not in mask_out_sequences:
                    mask_out_sequences.append(ids)
    pass

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85a5fa6034

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return_function = False, # Useful for iterating over lists
num_proc = None,
last_response_only = False, # Train only on the last assistant turn
mask_out_tokens = None, # e.g. ["</think>"] - also mask these inside kept responses

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate mask_out_tokens to vision collators

When callers use this new kwarg with a vision trainer, train_on_responses_only takes the later _is_vision_collator branch and assigns data_collator.train_on_responses_only from a recursive call at lines 695-702, but that call omits mask_out_tokens, so the inner closure builds an empty mask_out_sequences. This means VLM response-only training still includes </think> or any requested token in the loss despite accepting mask_out_tokens=[...]; pass the kwarg through the collator masking setup as well.

Useful? React with 👍 / 👎.

@pjordanandrsn pjordanandrsn Jul 3, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 76587db — the vision-collator recursive call now passes mask_out_tokens through, so the VLM path applies the same re-masking instead of silently ignoring the kwarg. Text-path label output is regression-identical.

return_function = False, # Useful for iterating over lists
num_proc = None,
last_response_only = False, # Train only on the last assistant turn
mask_out_tokens = None, # e.g. ["</think>"] - also mask these inside kept responses

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Forward mask_out_tokens through the MLX wrapper

When using the MLX API (unsloth_zoo.mlx.trainer.train_on_responses_only), this newly added parameter cannot be used because the wrapper signature at unsloth_zoo/mlx/trainer.py:2877-2886 and its call into this helper at 2933-2941 were not updated. Passing mask_out_tokens=[...] through the documented MLX mirror of the HF/Unsloth API raises TypeError before it reaches this implementation, so MLX training cannot apply the new masking behavior.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 76587db — the MLX wrapper signature now accepts mask_out_tokens and forwards it to the HF implementation, mirroring the rest of the API surface.

…r and MLX paths

The vision branch's recursive call omitted the new kwarg (silently ignoring
it for VLM training) and the MLX wrapper's signature rejected it with a
TypeError. Pass it through both. Text-path output is unchanged (label-diff
regression identical).

Addresses both Codex P2 review findings on unslothai#852.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant